Horton Tables: Fast Hash Tables for In-Memory Data-Intensive Computing
نویسندگان
چکیده
Hash tables are important data structures that lie at the heart of important applications such as key-value stores and relational databases. Typically bucketized cuckoo hash tables (BCHTs) are used because they provide highthroughput lookups and load factors that exceed 95%. Unfortunately, this performance comes at the cost of reduced memory access efficiency. Positive lookups (key is in the table) and negative lookups (where it is not) on average access 1.5 and 2.0 buckets, respectively, which results in 50 to 100% more table-containing cache lines to be accessed than should be minimally necessary. To reduce these surplus accesses, this paper presents the Horton table, a revamped BCHT that reduces the expected cost of positive and negative lookups to fewer than 1.18 and 1.06 buckets, respectively, while still achieving load factors of 95%. The key innovation is remap entries, small in-bucket records that allow (1) more elements to be hashed using a single, primary hash function, (2) items that overflow buckets to be tracked and rehashed with one of many alternate functions while maintaining a worst-case lookup cost of 2 buckets, and (3) shortening the vast majority of negative searches to 1 bucket access. With these advancements, Horton tables outperform BCHTs by 17% to 89%.
منابع مشابه
Cheap and Large CAMs for High Performance Data-Intensive Networked Systems
We show how to build cheap and large CAMs, or CLAMs, using a combination of DRAM and flash memory. These are targeted at emerging data-intensive networked systems that require massive hash tables running into a hundred GB or more, with items being inserted, updated and looked up at a rapid rate. For such systems, using DRAM to maintain hash tables is quite expensive, while on-disk approaches ar...
متن کاملStack and Queue Integrity on Hostile Platforms
When computationally intensive tasks have to be carried out on trusted, but limited, platforms such as smart cards, it becomes necessary to compensate for the limited resources (memory, CPU speed) by o loading implementations of data structures on to an available (but insecure, untrusted) fast co-processor. However, data structures such as stacks, queues, RAMS, and hash tables can be corrupted ...
متن کاملResizable, Scalable, Concurrent Hash Tables via Relativistic Programming
We present algorithms for shrinking and expanding a hash table while allowing concurrent, wait-free, linearly scalable lookups. These resize algorithms allow ReadCopy Update (RCU) hash tables to maintain constanttime performance as the number of entries grows, and reclaim memory as the number of entries decreases, without delaying or disrupting readers. We call the resulting data structure a re...
متن کاملCuckoo++ Hash Tables: High-Performance Hash Tables for Networking Applications
Hash tables are an essential data-structure for numerous networking applications (e.g., connection tracking, rewalls, network address translators). Among these, cuckoo hash tables provide excellent performance by allowing lookups to be processed with very few memory accesses (2 to 3 per lookup). Yet, for large tables, cuckoo hash tables remain memory bound and each memory access impacts perform...
متن کاملEfficient implementation of error correction codes in hash tables
Hash tables are one of the most commonly used data structures in computing applications. They are used for example to organize a data set such that searches can be performed efficiently. The data stored in a hash table is commonly stored in memory and can suffer errors. To ensure that data stored in a memory is not corrupted when it suffers errors, Error Correction Codes (ECCs) are commonly use...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016